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Abstract. In this paper we describe a Bayesian inference framework for analysis of 
data obtained by LISA. We set up a model for binary inspiral signals as defined for 
the Mock LISA Data Challenge 1.2 (MLDC), and implemented a Markov chain Monte 
Carlo (MCMC) algorithm to facilitate exploration and integration of the posterior 
distribution over the 9-dimensional parameter space. Here we present intermediate 
results showing how, using this method, information about the 9 parameters can be 
extracted from the data. 
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1. Introduction 



Once the LISA gravitational wave observatory is launched and operational, it is certain 
to measure a vast number of signals from a wide range of sources. Because the 
data will contain a superposition of individually modulated signals blended with noise, 
sophisticated methods will be required to disentangle individual signals and consistently 
infer their parameters. Bayesian inference provides a means to approach such complex 
problems, allowing one to quantify the information that is buried in the data in a 
coherent manner [H [21 [3] • We are convinced that these techniques will be useful, if not 
essential, to analyse the data that will be obtained through LISA. Bayesian procedures, 
in conjunction with Markov chain Monte Carlo (MCMC) methods, have successfully 
been applied for the analysis of ground-based GW measurements [H [5l [6], as well as 
in the context of LISA [71 |H], and in particular in the presence of source confusion [9]. 
Related work on MCMC methods for LISA inspiral analysis can also be found in [TOlITT]. 
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The authors have gathered as the ^Global LISA Inference Group^ (GLIG) and have 
set out to implement such an analysis framework for LISA data. We have developed 
some generic code modules providing vital components that can be adapted to allow 
analysis given different model specifications. In response to the first round of the Mock 
LISA Data Challenges (MLDC) [I2], we present the results of an analysis targeted at 
binary inspiral signals as addressed in MLDC Challenge 1.2. Within the same context, 
we approached MLDC Challenge 1.1, containing white dwarf binary systems. These 
results are presented in |T3j. 

We implemented an MCMC sampler to perform the integration of the posterior 
probability distribution of the 9 parameters that determine the waveform of a binary 
inspiral GW signal, and present results illustrating the parameter information that can 
be extracted from the data. Due to the tight schedule we did not manage to enhance 
the MCMC sampler's convergence capabilities sufficiently to get results for the 'blind 
search' data as well. For now we present results for the 'training' data only. 

2. Inference framework 

2.1. The Bayesian approach 

We use a Bayesian framework to perform inference on gravitational wave signals observed 
by LISA, aiming for the information about parameters that can be derived from the data. 
Information about parameters here is formulated in terms of probability distributions 
over the parameter space. First the prior knowledge about parameters i!) needs to 
be properly specified in the prior distribution Then parameters and data y 

are linked by defining the likelihood p{y\i^) that describes how the observables come 
about for given parameter values. Inference eventually is done via the parameters' 
posterior distribution p{'>!}\y), which expresses the information about the parameter 
values conditional on the oberved data. The posterior distribution is given by 
pi'&ly) oc pi'd) p{y\'d) , as a consequence from Bayes' theorem [IlEllS]. Inference on the 
measured signal's parameters (or other properties) requires integration of the posterior 
distribution over the parameter space, since one is usually interested in determining 
figures like posterior expectations, marginal (posterior) densities, or confidence regions. 
We approach the problem using Monte Carlo integration, for which we implemented a 
Markov chain Monte Carlo (MCMC) algorithm. The algorithm eventually is supposed 
to be able to reliably find the global mode(s) in the posterior distribution and then 
perform the integration, i.e. sample from the posterior [3l [H]. 

2.2. Data and parameters 

A gravitational wave (GW) signal is measured by LISA by monitoring the changes in 
proper distance between the three satellites as they are orbiting the Sun. The data is 
sampled every 15 seconds, which is also about the time it takes for a photon to travel 
from one satellite to another. The measured response is not a simple '1:1' mapping of 
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the signal waveform to the data, especially when the signal wavelength is of the order or 
below LISA'S armlength. Moreover, as LISA orbits, the response will also be modulated 
by Doppler effects and be affected by the change in the baseline orientations over time. 

The data produced by the spacecraft trio is combined to form three time-delay- 
interferometry (TDI) variables, X, Y and Z [15] . These can be linearly recombined into 
three stochastically independent components, out of which two are sensitive to GW 
signals (A and E) and one component is only noise (T) [16j. In the following we will 
only be concerned with the former two variables, A and E. 

In the restricted 2.0 PN approximation, the 9 parameters defining a binary 
inspiral's GW signal measured by LISA are chirp mass (mc), mass ratio (77), coalescence 
phase (0c) 5 coalescence time (tc), ecliptic latitude (9), ecliptic longitude (ip), luminosity 
distance (D), polarisation (i/)) and inclination angle (l) [TT] . 

2.3. Model 

For given data y from a single TDI variable the likelihood function is defined as a 
function of the parameters i} by 



where y and s are the (numerical) discrete Fourier transforms of data and signal 
waveform respectively, and Sn is the variable's (one-sided) noise spectral density [18]. 
The data (y) going into the likelihood here are the 'A' and 'E' TDI variables |16j . 
Assuming that these are stochastically independent, the likelihood then is the product 
of the individual variables' likelihoods. The signal waveform s to which the data 
are matched is the corresponding TDI response to the GW signal implied by the 
parameters 1^. The noise spectrum Sn refers to the noise in the corresponding (A and 
E) TDI variables. 

3. Implementation 

In order to infer the measured signal's parameters, one first needs to find the global 
mode(s) in the posterior distribution, and then also to 'explore^ the mode(s), i.e. 
simulate posterior samples. We tackle that problem using MCMC methods, an approach 
that in particular requires many likelihood evaluations. Likelihood computations are 
computationally expensive, since they require several time-consuming steps: Given a 
parameter set one first needs to compute the -|-/x polarisation waveforms emitted by 
the inspiral event, then the TDI response of the LISA interferometer to the GW signal, 
and finally its Fourier transform, before parameters can be related to the data through 
the likelihood. Since most of these (and more) steps are common between a wide range 
of different types of analyses, we set up our software in a modular style so that parts 
would be reusable and shareable in form of modules. See our accompanying paper [13] 
for another application that shares parts of the same code. So far, this also includes a 
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common framework to store and manipulate data internally, an interface to the lisaXML 
data format [IT], and the availability of Fourier transformation and spectrum estimation 
capabilities based on the FFTW library [19]. Most importantly, the derivation of LISA's 
response (in terms of X/Y/Z or A/E TDI variables) to a gravitational wave signal (given 
in terms of +/x polarisations, direction of source, and polarisation angle) was needed. 
Here we resorted to the LISA Simulator [201 121] , that was also used for the generation 
of the MLDC data, and which is coded in C, allowing us to easily incorporate it into 
our code and stay consistent with the provided data. Originally, the LISA Simulator 
was not intended to do its computations repeatedly and quickly, and it was possible to 
speed up the code by storing and reusing some intermediate steps. We implemented a 

Table 1. Computation speed of the MCMC code for different amounts of data (on an 
Intel Xeon 2.4 GHz processor). Most of the computation time (more than 95%) goes 
into deriving A/E TDI responses from the +/x GW waveforms. 

amount of data seconds 



days samples per iteration 



364 


221 


146 


182 


220 


75 


91 


219 


38 


46 


218 


19 


23 


217 


10 



simple Metropolis-algorithm [3l [14] to do inference on binary inspiral signals as defined 
for MLDC challenge 1.2.1 [17]. The eventual computation speed, depending on how 
much data are processed, is shown in table [H A similar implementation has proven 
successful in the context of ground-based GW measurements [6], and we are currently 
working on tuning and extending the basic algorithm. 

4. Results 

We applied the above framework to the data in MLDC challenge 1.2.1. The signal 
waveform was generated following the description given in [T7], and we estimated the 
A/E variables' noise spectral densities based on the section of data where the signal was 
absent. 

We ran the code on the 'training' data set, starting from the true parameter 
values, and, due to the low computation speed, only considered the last 2^^ samples 
(corresponding to 23 days of measurements) before coalescence for the analysis. The 
resulting speed of the MCMC sampler still was rather slow, producing a posterior sample 
every 10 seconds. By only considering the last part of of the signal before coalescence 
we are of course neglecting some information, but since the SNR of the injected signal 
was very high (almost 500), and most of that is actualised in the last phase immediately 
before coalescence, we will still be left with a high SNR. On the other hand, we will 



Inference on inspiral signals using LISA MLDC data 



5 



especially lose information about the location parameters {9, ip), since these are encoded 
in the long-term evolution of the signal, so we might find an increased degeneracy 
between these two parameters. As the efficiency of our code continues to improve we 
will of course be analysing larger sections of data. 
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Figure 1. Marginal posterior densities for all 9 parameters. Dashed lines indicate the 
true values. 



Figure [T] shows the marginal posterior distributions for all 9 individual parameters 
in comparison to the true parameter values (there is no true value shown for (pc, 
since we are using a different parametrisation: coalescence phase instead of initial 
phase). The true parameter values are all well covered by the posterior distribution, 
not only in these 1-dimensional projections, but also for all bivariate distributions, 
two examples of which are shown in figure [21 this demonstrates the consistency of the 
applied inference framework. Table [2] shows some summary statistics characterizing the 
posterior distribution, and relating it to the true parameter values. As one can see from 
figure [21 there is much posterior correlation, or degeneracy, between the parameters. 
In particular, there are two groups of parameters that are highly correlated with each 
other: firstly the two mass parameters, coalescence time and phase (rric, rj, tc, (pc), 
and secondly, the two sky location parameters and the luminosity distance {6, Lp, D). 
Correlation coefficients of parameter pairs within these groups are as high as 0.90-0.99, 
which greatly complicates sampling from the posterior. 

We also ran the code on the 'blind' challenge 1.2.1 data set, for which we did 
not know the true parameter values, but due to the code's speed and the size of the 
parameter space it would not converge and produce results in time before the submission 
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Figure 2. Marginal joint posterior densities and 99% credibility regions for two pairs 
of parameters. Dashed lines indicate the true values. 



Table 2. Some summary statistics characterizing the marginal posterior distributions 
for individual parameters. Mean and standard deviation describe location and 
accuracy, and the 99% central credibility intervals contain the corresponding parameter 
with 99%' probability, givcui tlw^ data at hand. 





mean 


St. dev. 


99% c.c.i. 


true 


unit 


chirp mass nic 


1023564 


215 


(1023033, 1024054) 


1023866 


Mo 


mass ratio rj 


0.1979 


0.0013 
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0.1995 




coalescence phase (f>c 


2.97 


0.14 


(2.63, 0.13) 




rad 


coalescence time tc 
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rad 



deadline for MLDC round 1. 

While the MCMC algorithm is working in principle and performing the posterior 
integration, more tuning is necessary to enhance its optimisation properties, i.e. its 
capabilities of finding modes by itself, and its efficiency in manoeuvering through 
parameter space. Better convergence properties are crucial not only to enhance the 
algorithm's overall applicability, but also to make sure not to be missing further posterior 
modes that may be of relevance. 
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5. Conclusions 

We have presented a Bayesian inference framework for analysis of GW signals as 
measured by LISA. We ran a basic MCMC algorithm on data simulating a binary 
inspiral measurement from the first round of the Mock LISA Data Challenges (MLDC). 
In a related effort [13], sharing parts of the same code, we applied a similar model to 
the analysis of signals from white dwarf binary systems. The MCMC implementation 
so far is a simple Metropolis algorithm, and the results illustrate that this approach 
ultimately allows one to extract and express the information about signal parameters 
contained in the data in a coherent manner. While the integration of the posterior 
distribution over the parameter space is fully functional, more work needs to be done 
on the MCMC sampler's optimisation capabilities as well as its efficiency. We are 
working on a preprocessing stage to the MCMC algorithm to provide rough parameter 
estimates as starting values for the MCMC sampler. We are also currently extending the 
Metropolis-sampler to a parallel tempering algorithm [HI [6] in a parallel implementation 
[22] . The underlying model will also need to be generalised by including the noise 
spectrum as an unknown, which might just mean the introduction of an additional 
'Gibbs step' in the MCMC sampler [31 [Hj. 
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